Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks #110

khumairraj · 2021-11-11T07:10:29Z

No description provided.

…nd pad

hearpreprocess/pipeline.py

hearpreprocess/sampler.py

turian · 2021-11-12T04:55:23Z

hearpreprocess/sampler.py

+        necessary_keys = sampler_config["necessary_keys"]
+
+        def requires(self):
+            return _get_download_and_extract_tasks(self.task_config)


hrmmm really? I mean this is the sampler so it's not core code, it's just for testing, but this seems wrong and smells bad.

requires() in luigi is supposed to return a Luigi task. Not do work. run() is where work should occur. Otherwise you might have weird luigi bugs that are hard to debug

_get_download_and_extract_tasks returns the tasks which will download and extract. So technically, the requires is still returning a list of tasks. This is how our main pipeline is working where we use this function to build the task and then pass it in the ExtractMetadata as luigiParameter and put it in the requires. As discussed in the previous comment, the main reason to do this here, is that the _get_download_and_extract_tasks is different for different tasks.

turian · 2021-11-12T04:56:16Z

hearpreprocess/sampler.py

+        "get_download_and_extract_tasks"
+    ]
+
+    class RandomSampleOriginalDataset(_RandomSampleOriginalDataset):


Why do we have RandomSampleOriginalDataset and _RandomSampleOriginalDataset? Why can't we just have only RandomSampleOriginalDataset?

The main reason to do this here like this is - The RandomSampleOriginalDataset requires the get_download_and_extract_task, which is a function and is specific to the task. This returns the task to download and extract the task. So the _RandomSampleOriginalDataset is overridden and named as RandomSampleOriginalDataset and the tasks are added. I cannot refer to both the global variable _RandomSampleOriginalDataset and local variable RandomSampleOriginalDataset inside the get_sampler_task function with the same name. So I had to make different names for them.

turian · 2021-11-12T04:57:07Z

hearpreprocess/util/audio.py

@@ -160,6 +160,9 @@ def get_audio_dir_stats(
            all_file_paths,
        )
    )
+    if len(audio_paths) == 0:


Wait why would this happen? Why is this okay?

If there is no audio in the downloaded directory, we should return nothing. This can happen, for example, if a downloaded directory has just metadata files, we still need to put that in the requires of the ExtractMetadata, so that the task can actually run. This function, the one here, runs on all the downloaded directories in the requires. So, this is not a problem as we still have another assert below, So that if audio files are found and the stats are not calculated, the assert will throw, if audio files are not found, which is in this case, it will return the empty dict.

turian · 2021-11-12T04:57:45Z

hearpreprocess/util/task_config.py

+            # should also be None and no subsampling will be done
+            if task_config["sample_duration"] is None:
+                schema["max_task_duration_by_split"] = Schema(
+                    {split: Or(int, float, None) for split in SPLITS}


None, not int, float, None

khumairraj added 5 commits November 11, 2021 06:47

Modify audio stats func to avoid failing at directories without audio

b7e122a

Modify shchema to allow for actual sample duration without any trim a…

4b65a9a

…nd pad

Modify sampler to work for custom download functions

498ddfb

Modify pipeline for variable audio size

b5b3719

Add safecopy Utilility function

e3f15fc

khumairraj changed the title ~~Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks~~ [WIP] - Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks Nov 11, 2021

turian reviewed Nov 12, 2021

View reviewed changes

hearpreprocess/pipeline.py Show resolved Hide resolved

Add description

0ca020b

khumairraj changed the title ~~[WIP] - Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks~~ Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks Nov 16, 2021

turian reviewed Nov 16, 2021

View reviewed changes

khumairraj and others added 6 commits November 17, 2021 04:08

Update comment and reverse variable name

466ff28

Modify task config

28e8f28

Merge branch 'main' into pipeline_changes

b1ae11e

imports

e17732c

flake8

34992b1

type fix

8a56ace

turian merged commit 943439c into main Nov 18, 2021

turian deleted the pipeline_changes branch November 18, 2021 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks #110

Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks #110

khumairraj commented Nov 11, 2021

turian Nov 12, 2021

khumairraj Nov 17, 2021

turian Nov 18, 2021

turian Nov 12, 2021

khumairraj Nov 17, 2021

turian Nov 12, 2021

khumairraj Nov 17, 2021

turian Nov 18, 2021

turian Nov 12, 2021

khumairraj Nov 17, 2021

Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks #110

Pipeline Modifications for variable Audio and Sampler Generalisation to different Download Tasks #110

Conversation

khumairraj commented Nov 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment